About me

Hi! I’m Ying. I am an Assistant Professor in the Department of Statistics and Data Science at the Wharton School, University of Pennsylvania.

I obtained my PhD in Statistics from Stanford University in 2024, advised by Professors Emmanuel Candès and Dominik Rothenhäusler. Prior to that, I studied Mathematics at Tsinghua University. Before joining Wharton, I spent one year as a Wojcicki-Troper Postdoctoral Fellow at Harvard Data Science Initiative, where I had the fortune to work with Professor José Zubizarreta and Professor Marinka Zitnik.

I currently help organize the Online Causal Inference Seminar.


Research interests

In many modern settings, AI systems act as imperfect proxies: they select cases to prioritize, label data at scale, or generate hypotheses that humans act on. I study statistical methods for inference with such AI‑mediated data and decisions, around three connected themes:

  • Uncertainty quantification and quality control for AI models
    Designing uncertainty quantification and error-control procedures that decide when AI predictions should (not) be trusted under explicit budgets of error or harm. Applications include predictive screening in drug discovery, medical AI, and generative design.

  • Agentic scientific discovery
    Developing the statistical foundations for AI systems that generate and prioritize scientific hypotheses from multi-source, large-scale data, aiming to build agents that are both creative and statistically credible. See POPPER.

Methodologically, my work combines tools from conformal prediction, selective inference, and causal inference. Much of my current work is motivated by collaborations in medicine and the life sciences (e.g., medical foundation models, pathology, and drug discovery), but the methods are intended to provide a general statistical foundation for reliable AI use in high-stakes settings.


News

  • Nov 2025: We develop Cross-Balancing for constructing covariate adjustment weights in observational studies with data-driven balancing features, via fitted functions or variable selection or both, or when combined with expert knowledge. It offers efficient estimation, valid inference, multiple chances to get things right, and reduced bias (unique to balancing weights)!

  • Sep 2025: Our paper on the predictive role of covariate shift in generalizability is accepted to PNAS! Analyzing two large-scale multi-site replication projects, it suggests a predictive, instead of explanatory, role of covariate shift: it informs the strength of unknown conditional shift, even though it does not explain away all the distribution shift between sites. See my blog post here!

  • May 2025: I’m organizing an invited session on generalizability, transportability, and distribution shift at ACIC 2025!

  • Apr 2025: I gave a talk on our POPPER agent framework at the International Seminar on Selective Inference! [slides] [recording]

  • Feb 2025: Imagine LLM agents for scientific discovery—agents that autonomously gather knowledge by creative reasoning and flexible tool use. How to ensure the soundness of what they acquire? We propose POPPER, a framework where LLM agents design sequential experiments, collect data, and accumulate statistical evidence to validate a free-form hypothesis with error control!

  • Sept 2024: Outputs from black-box foundation models must align with human values before use. For example, can we ensure only human-quality AI-generated medical reports are deferred to doctors? Our paper Conformal Alignment is accepted to NeurIPS 2024!

  • Sept 2024: My paper on optimal variance reduction in online experiments (2021 internship project at LinkedIn) receives the 2024 Jack Youden Prize for the best expository paper in Technometrics! Thank you, ASQ/ASA!

  • March 2024: How to quantify the uncertainty for an “interesting” unit picked by a complicated, data-driven process? Check out JOMI, our framework for conformal prediction with selection conditional coverage!

  • Sept 2023: I’ll be giving a seminar at Genentech on leveraging Conformal Selection [1, 2] for reliable AI-assisted drug discovery.

  • Sept 2023: Scientists often refer to distribution shifts when effects from two studies differ, e.g. in replicability failure. Do they really contribute? See our preprint for a formal diagnosis framework. Play with our live app, or explore our data repository! I gave an invited talk about it in the Causality in Practice Conference.

Beyond academics, I love traveling and photography in my free time. See my photography gallery!


Education

Recent posts